Goto

Collaborating Authors

 original data




Appendix 1 Back imagination and Back speech

Neural Information Processing Systems

Figure 1: The illustrative examples for two proposed techniques: Back-imagination and Back-speech. Tiny ImageNet [Le and Y ang, 2015] serves as a compact version of the comprehensive ImageNet dataset. The Stanford Sentiment Treebank-2 (SST -2) [Socher et al., 2013] is a sentiment classification dataset Given the scarcity of datasets for understanding natural language in visual scenes, we introduce a novel textual entailment dataset, named Textual Natural Contextual Classification (TNCC). This dataset is formulated on the foundation of Crisscrossed Captions [Parekh et al., 2020], an image In this work, we employ a uniform experimental configuration for both textual entailment and sentiment classification tasks. For the image classification task, we employ the ResNet18 [He et al., 2015] model, which is considered more suitable for small datasets.


Supplementary Material A Data Modeling

Neural Information Processing Systems

In this section, we provide further details for our data modeling. We note the difficulties of appropriately modeling the terminal variable which is a binary variable compared to the rest of the dimensions which are continuous for the environments we investigate. This is particularly challenging for "expert" datasets where early termination is rare. An immediate advantage of sampling data from a generative model is compression. As we discuss in Appendix B.3, sampling is fast ER provides high levels of dataset compression without sacrificing downstream performance in offline reinforcement learning.







Enhancing diffusion models with Gaussianization preprocessing

Cunzhi, Li, Kang, Louis, Shimazaki, Hideaki

arXiv.org Machine Learning

Diffusion models (Sohl-Dickstein et al., 2015; Ho et al., 2020; Song et al., 2020) have emerged as one of the most powerful classes of generative models for high-dimensional data, achieving state-of-the-art performance in image synthesis (Dhariwal and Nichol, 2021; Rombach et al., 2022) and other tasks such as action generation in robotic or protein design (Watson et al., 2023; Chi et al., 2025). However, sampling from these models is typically slow: many reverse-time steps are required to transform an initial Gaussian sample into a high-quality sample in data space (Ho et al., 2020; Song et al., 2020). This computational cost is especially problematic, and it restricts the practical deployment of diffusion models in real-time or resource-constrained settings (Salimans and Ho, 2022; Lu et al., 2022). Recent theoretical and empirical studies suggest that this inefficiency is closely related to a dynamical phase transition (bifurcation) that occurs during the reverse process (Raya and Ambrogioni, 2024; Biroli et al., 2024; Ambrogioni, 2025). In the early reverse steps, the trajectories stay near a stable fixed point whose distribution is close to the initial independent Gaussian, and little structure is present in the samples.